Big Picture: Why These Data Structures Exist

Reality: key spaces are astronomical; real datasets are sparse. Direct arrays for the whole universe are only viable when \(U \approx n\); in practice \(n \ll U\). Indexing structures are engineered responses to sparsity plus required query patterns.

Compression via Mapping

Hash Tables: Hash functions map a huge sparse key universe into a compact bucket array, sacrificing perfect direct addressing for expected \(O(1)\) operations with collision management + periodic resizing.

Order Over Sparsity

BST / Balanced Trees: Maintain an ordered subset, enabling ranges, predecessor/successor, sorted iteration. Balancing keeps height \(O(\log n)\) despite arbitrary key distribution.

Trade‑off Surface

We choose structures by operation mix, range needs, memory & cache locality, variance tolerance, and worst‑case robustness.

Direct addressing of size \(U\) is optimal only when the universe is dense. Everything else: clever layers to exploit that most possible keys never appear.

Summary & Key Takeaways

Binary Search Tree

Sparse ordered subset; shape depends on insertion order.

Avg: \(O(\log n)\)

Worst: \(O(n)\)

✓ Ordering & range queries

✓ Successor / predecessor

✗ Skews if unbalanced

AVL Tree

Strict height balance (|bf| ≤ 1) via rotations.

Avg: \(O(\log n)\)

Worst: \(O(\log n)\)

✓ Tight height bound

✓ Predictable latency

✗ Rotation overhead

Hash Table

Maps sparse keys to dense buckets via hash.

Avg: \(O(1)\)

Worst: \(O(n)\)

✓ Fast point lookups

✗ No ordering

✗ Poor range support

When to Use Which?

Tree-Based

Need ordering, ranges, rank, sorted iteration.

Hash Table

Dominated by point lookups / inserts / deletes.

Further Learning

More sparse indexing structures:

Red-Black Trees B / B+ Trees Tries / Radix Trees Order Stat Trees Cuckoo / Hopscotch Bloom / Cuckoo Filters

Each optimizes a dimension: I/O, cache lines, false positives, variance, or concurrency.